NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Medical image registration via neural fields

https://doi.org/10.1016/j.media.2024.103249

Sun, Shanlin; Han, Kun; You, Chenyu; Tang, Hao; Kong, Deying; Naushad, Junayed; Yan, Xiangyi; Ma, Haoyu; Khosravi, Pooya; Duncan, James S; et al (October 2024, Medical Image Analysis)

Full Text Available
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts

https://doi.org/10.1109/CVPR52734.2025.01227

Liang, Feng; Ma, Haoyu; He, Zecheng; Hou, Tingbo; Hou, Ji; Li, Kunpeng; Dai, Xiaoliang; Juefei-Xu, Felix; Azadi, Samaneh; Sinha, Animesh; et al (June 2025, IEEE)

Free, publicly-accessible full text available June 10, 2026
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

https://doi.org/10.1609/aaai.v37i7.26008

Kong, Zhenglun; Ma, Haoyu; Yuan, Geng; Sun, Mengshu; Xie, Yanyue; Dong, Peiyan; Meng, Xin; Shen, Xuan; Tang, Hao; Qin, Minghai; et al (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViT
more » « less
Full Text Available
Training Your Sparse Neural Network Better with Any Mask

Jaiswal, Ajay; Ma, Haoyu; Chen, Tianlong; Ding, Ying; Wang, Zhangyang (July 2022, International Conference on Machine Learning (ICML))

Pruning large neural networks to create high-quality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity. As research effort is focused on increasingly sophisticated pruning methods that leads to sparse subnetworks trainable from the scratch, we argue for an orthogonal, under-explored theme: improving training techniques for pruned sub-networks, i.e. sparse training. Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing ``ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks. By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, TinyImageNet), architectures (ResNet-18/32/104, Vgg16, MobileNet), and sparse mask options (lottery ticket, SNIP/GRASP, SynFlow, or even randomly pruning), compared to the default training protocols, especially at high sparsity levels.
more » « less
Full Text Available
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

https://doi.org/10.1109/FPL57034.2022.00027

Li, Zhengang; Sun, Mengshu; Lu, Alec; Ma, Haoyu; Yuan, Geng; Xie, Yanyue; Tang, Hao; Li, Yanyu; Leeser, Miriam; Wang, Zhangyang; et al (August 2022, 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL))

Full Text Available
Late Breaking Results: FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

Sun, Mengshu; Li, Zhengang; Lu, Alec; Ma, Haoyu; Yuan, Geng; Xie, Yanyue; Tang, Hao; Li, Yanyu; Leeser, Miriam; Wang, Zhangyang; et al (January 2022, Proceedings of the 59th Design Automation Conference (DAC))

Full Text Available

Search for: All records